Automated diagnostic and remediation workflow generation for Red Hat Enterprise Linux (RHEL) systems using AI and MCP (Model Context Protocol) servers.
Powered by OpenRouter: Access Claude, GPT-4, Gemini, and other leading models through a single API.
This project provides an end-to-end pipeline that:
- Diagnoses system issues from error logs using Red Hat's Security Data API and Knowledge Base
- Generates executable remediation workflows with proper error handling and approvals
- Produces workflow definitions compatible with the Nexus Workflow Engine
Error Logs
↓
[Diagnostic Agent]
↓ (uses MCP servers)
├── Red Hat Security Data API (CVEs, advisories)
└── Red Hat Knowledge Base (solutions, articles)
↓
Diagnosis (root causes + remediation steps)
↓
[Workflow Generator]
↓
Executable Workflow Definition (JSON)
↓
[Workflow Engine] (execution - not included)
See ARCHITECTURE.md for detailed visual diagrams of the entire process flow, MCP server architecture, agentic research loop, and data flow.
- ✅ Agentic Research: LLM autonomously researches using MCP tools
- ✅ Multi-source Intelligence: Combines CVE data + KB articles
- ✅ Structured Output: Generates valid workflow JSON
- ✅ Risk Assessment: Assigns risk levels and approval requirements
- ✅ Retry Policies: Automatic retry configuration based on risk
- ✅ Checkpointing: Saves diagnosis and workflow at each stage
redhat-diagnostic-workflow/
├── mcp_servers/
│ ├── redhat_security_server.py # MCP server for Security Data API
│ └── redhat_kb_server.py # MCP server for Knowledge Base API
├── diagnostic_agent/
│ ├── diagnostic_agent.py # Main diagnostic agent
│ ├── workflow_generator.py # Workflow generator
│ └── pipeline.py # Complete orchestration pipeline
├── examples/
│ ├── nginx_openssl_error.log # Example: nginx segfault
│ └── systemd_timeout_error.log # Example: systemd/PostgreSQL issue
├── requirements.txt # Python dependencies
├── .env.example # Environment variable template
├── test_redhat_access.py # Red Hat API connectivity test
├── run_demo.sh # Demo script
├── README.md # This file
├── QUICKSTART.md # Quick start guide
├── OPENROUTER.md # OpenRouter setup and usage
├── TESTING.md # Testing guide and troubleshooting
├── ARCHITECTURE.md # Visual architecture and process flow
└── CHANGELOG.md # Project changelog
- Python 3.10+
- OpenRouter API Key (Get here) ← Required
- Access to Claude, GPT-4, Gemini, and more
- No waitlist, pay-as-you-go pricing
- See OPENROUTER.md for setup and model selection
- (Optional) Red Hat Customer Portal credentials for authenticated KB access
- Nexus Workflow Schema (if using with Nexus)
cd ~/scratch/redhat-diagnostic-workflowpython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtcp .env.example .envEdit .env and add your API key:
# OpenRouter API Key (required)
OPENROUTER_API_KEY=sk-or-v1-...
# Optional: Red Hat credentials for KB access
REDHAT_USERNAME=your-username
REDHAT_PASSWORD=your-passwordNote: Red Hat credentials are only needed for authenticated KB endpoints. The Security Data API is public.
Before running the full pipeline, you can test connectivity to Red Hat APIs without requiring an LLM API key:
# Activate virtual environment
source venv/bin/activate
# Run Red Hat API access tests
python test_redhat_access.pyThis will test:
- Red Hat Security Data API (public, no auth required)
- CVE lookups
- Security advisories
- Package vulnerability searches
- Red Hat Knowledge Base API (optional auth)
- KB article search
- Solution lookups
- MCP server file checks
Expected output:
========================================
RED HAT API ACCESS TEST
========================================
TEST 1: Red Hat Security Data API (Public)
========================================
Get specific CVE
Testing CVE lookup for CVE-2024-6387 (OpenSSH vulnerability)
URL: https://access.redhat.com/labs/securitydataapi/cve/CVE-2024-6387.json
SUCCESS (200 OK)
CVE ID: CVE-2024-6387
Severity: High
CVSS3 Score: 8.1
...
TOTAL: 5/5 tests passed
All tests passed! Red Hat API access is working.
If tests fail, check:
- Internet connectivity to access.redhat.com
- Firewall settings
- Red Hat credentials (for KB API tests)
See TESTING.md for detailed testing guide and troubleshooting.
# Activate virtual environment
source venv/bin/activate
# Run pipeline with example error log
python diagnostic_agent/pipeline.py \
--logs examples/nginx_openssl_error.log \
--schema /path/to/workflow-definition.schema.json \
--output-dir ./output# Use custom session ID
python diagnostic_agent/pipeline.py \
--logs examples/systemd_timeout_error.log \
--schema /path/to/workflow-definition.schema.json \
--session-id "incident-2025-12-03-001" \
--output-dir ./output
# Pass error message directly (not from file)
python diagnostic_agent/pipeline.py \
--logs "nginx segfault in libssl.so" \
--schema /path/to/workflow-definition.schema.jsonAfter running, the pipeline creates:
output/
└── incident-20251203-142345/
├── error_logs.txt # Original error logs
├── diagnosis.json # Diagnostic results
├── workflow.json # Generated workflow
└── summary.json # Complete session summary
Provides access to Red Hat Security Data API:
Tools:
search_cve: Search CVEs by ID or package nameget_rhsa: Get security advisory detailssearch_affected_packages: Find affected packages for a CVEget_errata: Get errata information
Example standalone usage:
python mcp_servers/redhat_security_server.pyProvides access to Red Hat Customer Portal KB:
Tools:
search_kb: Search KB articlesget_kb_article: Get full article by IDsearch_solutions: Search for error message solutionssearch_by_symptom: Search by symptom description
Example standalone usage:
export REDHAT_USERNAME=your-username
export REDHAT_PASSWORD=your-password
python mcp_servers/redhat_kb_server.pySee examples/README.md for complete documentation of all 9 example scenarios.
Input (examples/nginx_openssl_error.log):
ERROR nginx: worker process exited on signal 11 (core dumped)
ERROR kernel: nginx[1234]: segfault in libssl.so.1.1
Diagnosis:
- Root cause: Vulnerable OpenSSL 1.1.1k (CVE-XXXX-YYYY)
- Severity: High
- Evidence: Segfault in libssl + CVE match
Generated Workflow:
- Backup nginx configuration (script, low risk)
- Stop nginx service (script, high risk, requires approval)
- Upgrade OpenSSL (ansible, high risk, requires approval)
- Restart nginx (script, medium risk)
- Verify health (API call, low risk)
Input (examples/systemd_timeout_error.log):
ERROR systemd: postgresql.service: Start operation timed out
ERROR postgresql: could not open file: Permission denied
ERROR selinux: AVC denial: denied read access
Diagnosis:
- Root cause: SELinux context mismatch on PostgreSQL data directory
- Severity: Medium
- Evidence: Permission denied + AVC denial
Generated Workflow:
- Check current SELinux context (script, low risk)
- Restore correct SELinux context (script, medium risk, requires approval)
- Restart PostgreSQL (script, medium risk)
- Verify database accessibility (API call, low risk)
The generated workflows match the Nexus Workflow Engine schema:
{
"schemaVersion": "1.0.0",
"version": 1,
"metadata": {
"name": "auto-remediation-20251203-142345",
"description": "Fix nginx segfault due to CVE-XXXX-YYYY",
"tags": ["auto-remediation", "redhat", "CVE-XXXX-YYYY"]
},
"triggers": [{"type": "manual", "requiresApproval": true}],
"workflow": {
"activities": [...]
}
}Automatically configured based on risk level:
- High risk: 1 attempt, fixed backoff
- Medium risk: 2 attempts, exponential backoff
- Low risk: 3 attempts, exponential backoff
Activities requiring approval:
- All high-risk operations
- Service restarts
- Package upgrades
- Manual intervention steps
Approval timeout: 10 minutes (configurable)
Solution: Set Red Hat credentials in .env:
REDHAT_USERNAME=your-username
REDHAT_PASSWORD=your-passwordCause: The CVE may not affect Red Hat products or hasn't been analyzed yet.
Solution: The agent will fallback to KB article search.
Solution: Ensure Python path is correct in diagnostic_agent.py:
StdioServerParameters(
command="python", # or "python3"
args=["path/to/server.py"]
)Cause: Generated workflow doesn't match schema.
Solution: Check workflow-definition.schema.json path and ensure it's the correct version.
- Public: No authentication required
- Rate limit: Reasonable use (no official limit documented)
- Authentication: Required for some KB endpoints
- Rate limit: Not publicly documented
pytest tests/- Edit
mcp_servers/redhat_security_server.pyorredhat_kb_server.py - Add new tool to
@app.list_tools() - Implement handler in
@app.call_tool() - Update agent prompt in
diagnostic_agent.py
- Execution: Workflow execution not implemented (generates definitions only)
- Ansible playbooks: Discovery works, but actual playbooks not included
- System access: Cannot directly query the failing system
- Context limits: Very large log files may need pre-processing
- Integration with Ansible Galaxy for playbook discovery
- Real-time log streaming support
- Multi-server diagnostics (cluster-wide issues)
- Workflow execution engine integration
- Automated rollback on failure
- Metrics and observability
- Red Hat Security Data API
- Red Hat Customer Portal API
- Model Context Protocol (MCP)
- Anthropic Claude API
MIT License (or your preferred license)
For issues or questions:
- Check the troubleshooting section above
- Review example error logs in
examples/ - Consult Red Hat API documentation
Built with: OpenRouter, MCP (Model Context Protocol), Red Hat APIs